Quarantine: Fault Tolerance for Concurrent Servers with Data-Driven Selective Isolation

نویسندگان

  • Thanumalayan Sankaranarayana Pillai
  • Andrea C. Arpaci-Dusseau
  • Remzi H. Arpaci-Dusseau
چکیده

We present Quarantine, a system that enables datadriven selective isolation within concurrent server applications. Instead of constructing arbitrary isolation boundaries between components, Quarantine collects data to learn where such boundaries should be placed, and then instantiates said barriers to improve reliability. We present the case for data-driven selective isolation, and discuss the challenges in realizing such a system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting server-level fault tolerance in concurrent-push-based parallel video servers

Parallel video servers have been proposed for building large-scale video-on-demand (VoD) systems from multiple low-cost servers. However, when adding more servers to scale up the capacity, system-level reliability will decrease as failure of any one of the servers will cripple the entire system. To tackle this reliability problem, this paper proposes and analyzes architectures to support server...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Byzantine Fault Tolerant Execution of Long-running Distributed Applications

Long-running distributed applications that automate critical decision processes require Byzantine fault tolerance to ensure progress in spite of arbitrary failures. Existing replication protocols for data servers guarantee that externally requested operations execute correctly even if a bounded number of replicas fail arbitrarily. However, since these protocols only support passive state machin...

متن کامل

Minimal Byzantine Storage

Byzantine fault-tolerant storage systems can provide high availability in hazardous environments, but the redundant servers they require increase software development and hardware costs. In order to minimize the number of servers required to implement fault-tolerant storage services, we develop a new algorithm that uses a “Listeners” pattern of network communication to detect and resolve orderi...

متن کامل

Large-Scale Computation Not at the Cost of Expressiveness

We present Celias, a new concurrent programming model for data-intensive scalable computing. Celias supports many virtues commonly found in existing distributed programming frameworks, such as elastic scaling and fault tolerance, without sacrificing expressiveness. The key design idea of Celias is the concept of a microtask, as a scalable, fault-tolerant, and completely data-driven unit of comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011